Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add receiver downscale endpoint #88

Open
wants to merge 17 commits into
base: db_main
Choose a base branch
from

Conversation

yuchen-db
Copy link

@yuchen-db yuchen-db commented Oct 11, 2024

The endpoint expose number of TSDBs, rollout operator will patch the sts if n_tsdb=0
related PRs:
https://github.com/databricks/universe/pull/753653
databricks/rollout-operator#7

@yuchen-db yuchen-db changed the title update downscale Oct 12, 2024
@yuchen-db yuchen-db changed the title downscale db downscale Oct 14, 2024
@yuchen-db yuchen-db changed the title db downscale Add receiver downscale endpoint Oct 14, 2024
@yuchen-db yuchen-db requested review from a team, christopherzli, jnyi, hczhu-db and yulong-db and removed request for a team October 14, 2024 23:05
@@ -17,6 +15,11 @@ import (
toolkit_web "github.com/prometheus/exporter-toolkit/web"
"golang.org/x/net/http2"
"golang.org/x/net/http2/h2c"
"net/http"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored

@@ -311,6 +311,8 @@ func runReceive(
httpserver.WithGracePeriod(time.Duration(*conf.httpGracePeriod)),
httpserver.WithTLSConfig(*conf.httpTLSConfig),
)
var lastDownscalePrepareTimestamp *int64 = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implementation assumes the pod won't restart, to preserve the state, we might need to persist it on local PVC too, any thoughts?

Copy link
Author

@yuchen-db yuchen-db Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but it's safe to lose this state. rollout operator will only do downscale if the returned timestamp is more than 30 seconds ago. If we lose this state, it returns the current timestamp and downscale will not accidentally happen. In the worst case, a scheduled downscale will be delayed for 30 seconds.

@@ -106,6 +106,14 @@ func (t *MultiTSDB) SkipMatchExternalLabels() {
t.skipMatchExternalLabels = true
}

func (t *MultiTSDB) GetTenants() map[string]*tenant {
return t.tenants
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map is protected by mtx. A call site of GetTenants() can hold a pointer to the map and do reads/writes while the map is being updated by other goroutines. That'd be a data race.
It's safer to return a deep copy of the map if copying is not too expensive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored code

func RegisterDownscale[K comparable, V any](s *Server, m map[K]V, mtx *sync.RWMutex, t *int64) {
s.mux.Handle("/-/downscale", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
mtx.RLock()
n := len(m)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't pass in a map, if only the size of map is needed. Pass in the size of the map.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the size of the map changes dynamically during handler runtime and Golang passes map by reference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored code

@yuchen-db yuchen-db force-pushed the yuchen-db/scaledown-with-operator branch from cf878f5 to ea0d891 Compare October 22, 2024 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants